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ABSTRACT 



• This paper addresses the problem of screening potential 
variables for entrance in a linear multiple regression set- 
ting. The purpose of the work presented here is to propose 
two screening methods, both of which have roots in principle 
component analysis, and which evaluate a combination of 
variables in an efficient enough manner so that enumeration 
of all combinations is feasible even when the number of po- 
tential variables is quite large. Using the square of the 
multiple correlation coefficient as the criterion, the se- 
lections made by these methods in several test cases are 
evaluated, and compared with the selections made by the 
methods of total enumeration and stepwise regression. The 
paper concludes with overall evaluations of the two methods 
and suggests directions for further study. 
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I. INTRODUCTION 



The search for the important variables in a multiple 
linear regression setting is, due to its great practical 
importance, an area which has received considerable atten- 
tion. Researchers often use a large number of potentially 
important variables in the exploratory stages of their work. 
These need be screened in order to determine the most par- 
simonious subset of these variables available for predicting 
or estimating the response with an acceptable level of er- 
ror. It appears that a total enumeration of all combina- 
tions of variables [Ref. 1] is necessary to be assured of 
making the best selection. This process is computationally 
overwhelming when the number of variables becomes even mod- 
erately large. (Each combination requires a matrix inver- 
sion, and there are 2^ such inversions to perform if p is 
the number of variables.) 

A highly popular approach to this problem is the use of 
stepwise regression [Refs. 2, 3, and 4], It is basically 
a one-step look-ahead method, and uses significance tests 
based on distributional assumptions to judge the combinations 
of variables under consideration at each step. It appears 
to do an adequate job of selection, and is readily available 
in packaged form (specifically BMD and SPSS). 

The purpose of the present work is to present and study 
some alternatives to stepwise regression in hopes of finding 
viable competitors. It is desirable that these methods 



7 



should perforin at least as well as stepwise regression, yet 
remain feasible computationally. In this regard the method 
of principle component analysis of the antecedent variables 
is useful and serves as a guide. 

This paper will pursue the development of these alter- 
natives in the following manner. A discussion of the general 
problem will be presented first, along with remarks concern- 
ing ways of measuring the effectiveness of the combinations 
of variables. Comments concerning several suggested screen- 
ing methods will follow, being followed in turn by a discus- 
sion of stepwise regression. A development of the alternatives 
under consideration in this paper will be presented, and a 
comparative evaluation of their performance on some test 
cases will conclude the work. 
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II. THE LINEAR REGRESSION MODEL 

In order to introduce the linear model, it is convenient 
to agree to a common notation. Let y be the dependent, or 
response, variable, and x^,X 2 ,...,x be the control, or an- 
tecedent, variables. Assume that N sets (y ,x^ ,x^ , . . . ,x^) 
are observed, and for convenience, let each member of the 
set: be replaced with its deviation from the sample mean: 

y j ^ frj'y) and x ij ( x ij ‘ x -j.) j = l,...,N. (2.1) 

The response y is, of course, viewed as random. The an- 
tecedent variables may be either random or deterministic; 
it does not matter which. We are concerned only with the 
question of which subset of them should be permanently col- 
lected and not with formal statistical inference per se. 

When means, variances, covariances, and correlations are in- 
troduced, they refer only to the sample quantities. Further, 
no distributional assumptions are made about y. Thus de- 
cisions regarding the appropriateness of the various com- 
binations of variables are structured on ad hoc data analysis 
groundsand not on formal tests. 

N 

Using the column vector Y to denote the {y-}^, and the 

N 

matrix X for the N sets of {(x, .,..., x .)}-., the usual lin- 

lj PJ 1 

ear model 

Y = X 3 + e (2.2) 

Nxl Nxp px] Nxl 

is assumed, where 3 represents the vector of regression co- 
efficients and c the vector of residuals. The estimation of 
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$ is achieved by the method of least squares, the solution 
being [Refs. 2 and 5]: 



0 = (X ' X) ~ X X ' Y 



(2.3) 



assuming X has full rank (as it generally will in data 
analysis situations) and the prime denotes transpose. The 
correlation matrix of the x^,...,x variables is introduced 
as 



C 



pxp 




(2.4) 



The computational problem associated with total enumera- 
tion can now be made more explicit. Consider a subset of 
size q, (x^ ,...,x^ ) of (x^,...,x ). There are (Jl) such 

subsets, and for each of these a covariance matrix (a minor 

/\ 

of (2.4)) must be inverted to produce the corresponding 2 
(Eqn. (2.3)). This done, one must choose the best (in some 
sense) subset of size q and do this for each q=l,...,p. 

A number of criteria for judging the fit of a subset 
of variables are available, including multiple correlation, 
standardized total squared error [Ref. 6] , and variance of 
residuals. The approach taken here is to chart the growth 
of the square of the multiple correlation coefficient, R 2 , 
as a function of q. This can be done for total enumeration 
(using that subset (x. , . . . ,x. ) that maximizes R 2 for each 

1 i 1 q 

q) , for stepwise regression, and the two alternatives to be 
introduced. Once such charts are made, the user may choose 
q to meet his own needs. The researcher hopes that R 2 grows 
very rapidly and becomes very close to its maximum (achieved 
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when q=p) for small q. The worse case occurs when R 2 grows 
linearly with q. 

This approach seems simple and reasonable. No formal 
significance tests are made and the tenant difficulties 
connected with simultaneous inference are not addressed. 
Although the method of stepwise regression uses formal sig- 
nificance testing in its intermediate stages, the results 
of using it can still be compared using our simple ad hoc 
approach. 

Remark: In recent work with the method of ridge regres- 

sion [Ref. 7] the use of unbiased estimators Eqn. (2.3) is 
foregone and more general measures based on average squared 
error are used. In this method the principle diagonal of 
the covariance matrix Eqn. (2.4) is loaded in an effort to 
trade bias against a smaller mean squared error. Comparison 
of the methods presented here with those of ridge regression 
is not considered in this thesis. 

To present computational formulae for the measures of 

effectiveness discussed previously, additional notation is 

convenient. Let s be the pxl column vector of covariances 

between y and (x, , . . . ,x ), and let c be the variance of y. 

j v 1 . ’ p J * yy ' 

The variance of residulas can be estimated as 




I (Y-XB)'-(Y-Xg) 



(2.5) 



where 



e = Y-X3 



( 2 . 6 ) 



is the estimate of residuals. The square of the multiple 
correlation coefficient is [Ref. 5] 



1 



(2.7) 



R 2 = 



a z / c =s'C^s/c 

yy yy 



and C is given in Eqn. (2.4). When the model is reduced to 
(x. x. ), the matrices and vectors must be modified 

accordingly. 

Three test cases are introduced to evaluate the methods 
under consideration (Tables I-III). There the pertinent 
quantities C, s, and c are presented as augmented matrices 
in the format 



yy 

s 



The first test case was generated specifically to expose 
a weakness of the stepwise approach, as will be seen. The 
second matrix was designed to be of sufficient complexity to 
give a more enlightening comparison of the methods under con- 
sideration, yet small enough so that the total enumeration 
solution could be obtained. Care was taken to ensure that 
the criterion of positive definiteness (see Appendix B) w r as 
met . 

Because the first two examples are artificial in nature, 
an application using real data was sought. Such was ob- 
tained from a study currently underway [Ref. 8], There, a 
survey concerning subjective reactions to a set of fourteen 
drugs is being made to ascertain how the subjects perceive 
the various drugs. Each drug is rated from one to seven on 
each of the following fourteen scales; violence, growth, 
sharpness, destruction, ehancement, activity, goodness. 
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avoidance, integration, positivity, permanence, speed, 
severity, and strength. 

One of the studies within this investigation is to de- 
scribe the scale "severity" in terms of the other scales. 
More specifically, what subset of the other scales "best" 
describes severity? This problem presents an opportunity 
to compare the screening methods being presented here with 
stepwise regression, and provides a third test matrix. 

This data is rounded to one digit to conserve space, and 
in this form may not be positive definite. See Table III. 

Of course, the original matrix was used in the computations. 
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III. CURRENTLY USED SCREENING PROCEDURES 



There are a number of methods currently used to screen 
variables, among them forward selection, backwards elimina- 
tion, stepwise regression, and several graphical techniques 
[Ref. 6]. Because stepwise regression incorporates the best 
of two of the above methods, enjoys general acceptance, and 
is readily available as a packaged program, it will serve 
as a baseline for measuring the performance of the methods 
under study here, and will be discussed in greater detail. 

Stepwise regression is based on an underlying assumption 
of normality for the response variable y. As a result of 
this, sums of squares from several sources (including re- 
sidual error, regression, and reduced model) have Chi-Square 
distributions. Thus at any step, a potential new variable 
may be tested for its significance if allowed to enter the 
system. Among those eligible to enter, the most significant 
(in terms of the F statistics being formed) is selected. 

Then those variables entered at previous steps are tested 
to determine whether their presence remains significant. 
Again, statistics are used as the criteria for deletion. 

When no variables can either enter or leave, the process 
terminates. Stepwise regression has the ability to look 
ahead only one variable at a time, and thus cannot guarantee 
an optimum solution. 

One of the most curious characteristics of stepwise re- 
gression as it is used in the packaged programs available 



IS 



(specifically the BMD and SPSS regression packages) is the 
set of critical points for the F-statistic used by the com- 
puter as criteria for entrance of variables. In order to 
reduce core requirements these packaged programs allow a 
single value from the F table to be used as the criterion, 
despite the change in degrees of freedom required each time 
a new F-statistic is formed. Thus what may be a test of 
significance at the level a for one F-statistic with p and 
q degrees of freedom will not remain so for a new F-statistic 
with r and s degrees of freedom. As a result, there is no 
easy way to control the actual level of significance of each 
test performed (much less the overall level of significance 
of the final combination chosen with the multiple tests). 
Further, the default values (for entering a variable) in 
both the SPSS and BMD regression packages are set at 
F . -.=0.01. While these may be changed bv the user, manv 

users are unaware of the problem and rely on the default 
value. This default value is not one which most users would 
initially choose, as for most F tests this would correspond 
to a significance level close to one. (In fairness to step- 
wise regression it should be stated that the problem is with 
the packaged programs, not with the method itself.) 

Two advantages to stepwise regression are worth noting. 
The first is that it only needs to look at a small subset of 
the total number of combinations before terminating, and 
therefore relatively large problems become computationally 
feasible. The second is that when the underlying assumptions 
are met the results of its screening seem to be generally 



quite good. 
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There are a number of disadvantages to stepwise regres- 
sion. The first is that it is extremely difficult to "see 
into" the method and understand what it is doing at any step. 
The' computer printout is confusing, and many users may be 
only dimly aware of what is happening. Consequently, the 
results obtained may often be unsatisfactory, as the user 
delegates too much analysis to the computer program. 

A second disadvantage is that of the underlying distri- 
butional assumptions. Robustness to departures from the 
normality assumption becomes an issue, as well as the pre- 
viously stated problem of simultaneous inference. 
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IV. SCREENING OF VARIABLES 



The method of principle components [Refs. 2 and 5] ap- 
plied to the antecedent variables provides a platform for 
introducing the screening methods presented in this paper. 
Further, the growth of the multiple correlation coefficient 
when the principle components are used as antecedent vari- 
ables is easily developed and serves as a rough standard 
for the kind of growth that may be available using the orig- 
inal variables. 

The mathematical structure is useful in that the compo- 
nent variables are orthogonal (uncorrelated), and their 
variances are readily obtainable, as are their correlations 
with the response variable. The matrix of eigenvectors 
serves to expose those antecedent variables which exert 
most influence over the orientation of the original data. 

Thus it seems reasonable that useful screening methods can 
be obtained by first finding the important principle com- 
ponents, and then the important original variables that in- 
fluence these components . 

General developments of the method of principle components 
are readily available (see for example [Refs. 2 and 5]). The 
salient properties used herein arc presented below and, for 
sake of immediate reference, developed in Appendix A. 

The rotation of the vector of antecedent variables x to 
the principle components vector v may be expressed as follows: 
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V 



W'x 



(4.1) 



where the columns of W, (w^ > • • • > w p) > are the eigenvec- 

tors of C (Eqn. (2.4)). The covariance matrix of v is di- 
agonal with the variances being the eigenvalues. The 
covariance of a typical v^ with the response y can be cal- 
culated as 



Cov(y,v.) = E (yv i ) = w!E (yx) = w|s 



(4.2) 



and hence the correlations are 



w! s 
i 



y,v i = 



/X . c 
i yy 



— — . 1 . W../C7T 

/rrJ =1 J 1 u 

l 



'y,x. 



(4.3) 



where w. - are the elements of W and C. . the elements of C. 
iJ 

When the principle components are considered as being 
the antecedent variables , it is an easy task to chart the 
square of the multiple correlation coefficient R 2 as a func- 
tion of the number of variables q. One need only order the 
(v^,...,v ) according to the magnitudes of their correla- 
tions with y (given by Eqn. (4.3)): 

> r 2 .. . (4.4) 



r 2 > r > • 

•y>v, y,v ? 



y,v 



It follows from (2.7) that 



<1 <1 i 

R 2 = .E- r 2 = .E. yi 
pc i=l y , v. i=l X 



l l_ 



. Z .. w • ■ /C . . r 
3=1 3i 3 3 y> x -j 



1 ^ 1 
-i- . E, y~ 

c 1 = 1 X . 

yy i 



. Z . w . . s - 
3 “ 1 3i 3 



(4.5) 
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A. FIRST SCREENING METHOD (Ml) 



Our goal is to select the best subset of size q from 
where maximization of R 2 is the criterion. 
Tke best q principle components are already in hand from 
(4.4). It seems reasonable to try to march this vector 
with the "closest” q-dimens ional flat in x-space. First 
we will introduce some notation: the new rotation to the 

subset of q principle components may be denoted 



V l~ 




w ll» 


• • • • W -| 

pi 




X 1 


• 


11 


• 






• 


1 — ‘ 

c • 

& 




W. , 
lq 


.... w 

p q _ 




X 

Pj 

rnmmm m i 



That is , 

v = W*' x 
qxl qxp pxl 



(4.6) 



(4.7) 



Note that W* consists of q eigenvectors, each of which is 
complete (that is, the deletion is among the eigenvectors, 
not across them) . 

We now have q of the principle components represented 
as linear combinations of all p of the {x^}. The second 
step is to reduce the number of x^ required. Note that the 
act of dropping some of the x^ may be viewed as the removal 
of the corresponding columns of W* ’ and their replacement with 
columns of zeroes. Thus 



v** = W* * ' x . (4.8) 

qxl qxp pxl 
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Finding the "closest" subset mentioned above will be 



accomplished by 



min E | | v* - v** | | 2 



(4.9) 



where the operator E refers to averaging. Such a minimiza- 
tion must take place for each q=l,...,p. The solution to 
this problem will be referred to as method Ml. 

In terms of the elementary quantities, (4.9) may be 
written as 



In order to describe the minimization process of (4.9), 
let S be a set of q subscripts of the variables considered 
for inclusion in the regression, so that its complement, S, 
contains subscripts of those variables to be excluded. Then 
(4.10) becomes 



E | | v* -v* * | | 2 = E | | (W*' -W**')x| | 2 




2 



(4.10) 




E | | v*-v** | | 



q 

2 = Z E_ Z_ 
. i=l j eS keS 




(4.11) 



Further, let Q = (Q^ , 



••• 9 



Q ) where 
V P 




0 if i S 



1 if i S 



(4.12) 



Finally let Z be defined 



pxp 
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(4.13) 



( z ) kj - iSi w ji w ki . 



It follows that (4.10) can be represented as 

P P 



(4.14) 



E I I v*-v** I I 2 = Z Z Z . . C .. = .Z- Z C.,(l-Q.) (1-Q. ) 

jeS keS J k J k J = 1 W V k 

The closest v** may be bound by forming all (^) subsets S of 
size q, then computing (4.14) for each, and choosing the 
smallest . 

It is useful to express this algorithm in another way. 

From (4.10) we obtain 



E | | v* -v** | | 2 = 



q p p 

.z, . z, . z, (W* * -w** ' ) - - (w* • -w** 1 ) . , c . , 

i=l j=l k=l ^ ij lk jk 



(4.15) 



P P 

j=i k=i C( w *' ' ( w * ! -w**’)) - k c jk , 



Since 



(W**') . . 



(W* T ) . .0. 



(4.16) 



we see that 



(W*» -W** •) . . = < 
il ' 



0 if Xj is to be included 



w. • if x. is to be excluded 
ij 1 



yielding a form that is easily computerized. 



(4.17) 



B. SECOND SCREENING METHOD (M2) 

The rationale for the second method begins with the ob- 
servation that when (4.5) is used to calculate R 2 (at q=p) , 

J max ^ 1 1 J ’ 



R 2 

max 



_i_ ? J- 

c i-1 A . 
yy i 



P 

. Z . w . . s - 
1=1 Ji 1 



(4.18) 
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one can decompose the inner summation into two parts; one 
associated with the variables to be kept and one with the 
variables to be deleted. The former is indexed by the set 
S and the latter by its complement S’. Letting 



t . . = 



w . . s . 



v /c XT J 1 j 

yy i 

expression (4.18) can be written 



i j j 1 » • • • > P 



(4.19) 



R 2 

max 



p r 1 

■ £ , - t . - + .Ep t- • 

i=l jeS ij jeS lj I 



(4.20) 



= R 



* 2 



+ - E, {( E t. .) ( E t. .) + ( E t. -) 2 }. 
1=1 o i] n 5 i] ; ^ c ir 



,A2 



This serves to define R for each set S of indices of vari- 
ables to be kept. It seems reasonable that a set S that 
maximizes R* 2 should define a good set of variables to re- 
tain. The process of determining the set S will be referred 
to as method M2. 

The maximization problem may be couched nicely in mathe- 
matical programming notation as follows: 



maximize R 



*2 



= . E, .2, t . .0. 

1=1 U =1 J J 

P 



(4.21) 



. E .. Q . = q . 
1=1 T 1 



subject to 

The expression also lends itself well to consputerization. 

As in method Ml, we must generate all possible Q vectors 
and compute all possible candidates for (4.21), ultimately 
choosing the largest for each value of q. 

It is interesting that this method can be developed from 
another point of view. Prom (2.7) the square of the 
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multiple correlation may be expressed as 



R 2 

max 



'yy 



s , C* 1 s = 



-- Trace [ss * C *] . 



'yy 



(4.22) 



In hopes of finding a viable screening method, we again con- 
sider a subset S of subscripts, and limit the above trace 
computation to elements of S. To be more explicit, let 



D = Diag (Q 1 , . . . ,Q ) . (4.23) 

Then let us consider screening the vector s with D by looking 
at the pxp matrix Dss'D. We may think of this as a matrix 
which has been stamped with a grid so that non-zero elements 
can be found only in those rows and columns both of whose 
indices belong to S. Comparison of all possible "stamps" 
of order q allows us to choose that combination which maxi- 
mizes the trace in (4.22) Equation (4.22) becomes [Ref. 9] 

(4.24) 

R* 2 = — Trace(Dss ’DC -1 ) = — Trace (ss ’ DC" 1 D) . 
c c 

yy yy 

However, we can show that this is equivalent to method 
M2 as follows. Let 



Apxp = E(vv') = E(W'xx'W) = W’E(xx’)W 
= Diag(X 1 , . . . ,A ) 
and notice that 

C = WAW' and C _1 = WA' 1 W* 

properties that follow from the orthogonality of W. It fol- 
lows that 

Trace [ss • DC _1 D] = Trace [ss • DWA" ] W ' D] . (4.27) 



= W'CW 

(4.25) 



(4.26) 
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- Since 



(WA -1 W' ) r j - k i! W rk w jk /X k 



(4.28) 



it follows that 

(mu-hvB) ri = J 1 w rk w jk Q r Q j /x k . 

Since (ss') . = s.s , (4.27) becomes 

v ; ir 1 r* v J 



(4.29) 



P P P 

Trace[ss'DW 1 W'D] = E. .E, ,E. s.s w ,w..Q Q./A. 

L J r=l 3=1 k=l 3 r rk 3 k x r x 3 k 

(4.30) 

P P P P P 2 

: E, .E, , E, ti -t, Q Q. =c t Ei [.E, t, .0.1 

yy r=l j=l k=l k 3 kr x r x 3 yy k -1 l 3 =l k 3 x 3 J 



= c 



using (4.19). Comparison of this with (4.20) completes the 
proof. Hence method M2 may be thought of as an attempt to 
approximate the inverse of a qxq minor of C by a stamped ma- 
trix DC ^"D. 
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V. RESULTS AND CONCLUSIONS 



A. RESULTS 

As was mentioned previously, the multiple correlation 
coefficient is a convenient criterion for judging the ef- 
fectiveness of the combinations of variables selected by 
the screening methods under discussion. Thus the results 
of the screening methods can best be summarized with the 
graphs of multiple correlation versus the number of variables. 

The first test matrix, as mentioned, was designed to 
expose a weakness in stepwise regression. Figure 1 indi- 
cates this quite well. The printout of the SPSS regression 
program may be reviewed in Table IV. Variable x^ was se- 
lected first because it was highly correlated with y. Step- 
wise regression then chose variable x^. Variable x^ remained 
significant, and was left in the equation. Stepwise regres- 
sion then selected variable x^ , found it significant, and 
terminated. It never observed the pair x^jX^ alone, which 
in this case was a much better pair than the one stepwise 
regression chose at step two (x^,Xj). Further, had stepwise 
regression chosen X 2 >Xj,it would not then have selected x^. 
This can be seen in Table IV. The reason for this is as 
follows. The square of the multiple correlation (R 2 ) of 
variables x-^x^ is a great deal smaller than R 2 for x^jX^. 

Thus when stepwise regression considered adding x 7 to the 
set x^,Xj, the sizable increase in R 2 caused it to accept 
the new variable. On the other hand, when It was given the 
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CORRELATION COEFFICIENTS 

A VALUE OF 99.00000 IS PRINTED 

IF A COEFFICIENT CANNOT 8E COMPUTED. 
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Tabic IV. 
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VARIABLE ( S ) ENTERED ON STEP NUMBER 2.. 
R 



MULTIPLE 
R SQUARE 
STANDARD ERROR 



VARIABLE 

XI 

X3 

(CONSTANT ) 



0.64291 
0.41333 
0. 78207 



VARIABLES IN THE EQUATION 



B 
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0.26667 
1 • 33333 



BETA 

0.46667 
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DEPENDENT VARIABLE.. Y 

VARI ABLE ( S ) ENTERED ON STEP NUMBER I.. 



MULTIPLE R 
R SQUARE 
STANDARD ERROR 



VARIABLE 

X 3 
X2 

(CONSTANT) 



0.67420 

0.45455 

0.75410 



B 

0.45455 

0.45455 

0.45455 



BETA 

0.45455 

0.45455 



VARIABLE(S) ENTERED ON STEP NUMBER 2.. 



MULTIPLE R 
R SQUARE 
STANDARD ERROR 



VARIABLE 

X3 

X2 

XI 
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0.49333 

0.73465 



VARIABLES IN THE EQUATION 



* * * * # 
X3 


£ * # * 4 


ERROR B 


F 


0.12901 
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0.12901 


4.273 


X3 




X2 




i 

ERROR B 


F 
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. .V 

' V V 'l* V 

XI 

1 


sV ^ -c, 

*<* 'r* v *r 



B 
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0.33333 
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0.33333 
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STD ERROR B 
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Tabic IV. continued. 
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DEPENDENT VARIABLE.. Y 

VARIABLES ) ENTERED ON STEP NUMBER 3.. 



X2 



MULTIPLE R 
R SQUARE 
STANDARD ERROR 



0. 70238 
0.49333 
0.73465 



VARIABLE 



XI 

X3 

X2 

(CONS' 



ANT) 



VARIABLES IN THE EQUATION 

B BETA STD ERROR B 



0.26667 

0.33336 

0.33333 

0.33333 



0.2 666 7 
0.36335 
0.33333 



0.14210 
0. 1 2368 
0.12368 



7 . 2t>3 
7.263 



MAXIMUM STEP REACHED 



Table IV. continued, 
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opportunity to add x^ to the pair x 2 > x 3 > there was not suf- 
ficient improvement, and was not added. Since stepwise 
regression missed the best pair, it did not terminate until 
all three were included, whereas it could have terminated 
with two variables had it found the best pair. 

Note also that while method Ml falls into the same trap 
as stepwise regression, method M2 selected the same combina- 
tions as did total enumeration for each q=l,2,3. The reader 
will also observe that the principle component multiple cor- 
relation curve serves quite well in this graph, and the two 
that follow, as a standard with which the other methods may 
be compared. 

It is interesting that this curve and the total enumera- 
tion curve are generally very close in the cases presented 
here, and in fact cross each other in the second case. This 
observation is useful since in many applications the total 
enumeration solution is unavailable. 

Figure 2 indicates the results of the various methods 
with the second test matrix. Nore that while the R 2 curve 
for method M2 runs well with the total enumeration and step- 
wise regression curves (identical curves in this case) , the 
curve for method Ml runs consistently lower throughout the 
entire midrange. 

Figure 3 presents the results of the screening methods 
on the third test matrix. This is of particular interest 
because the results should be useful in the study cited ear- 
lier. Note again that the curve for M2 runs quite strongly 
with the curve for stepwise regression, but that the curve 
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Figure 1. Test Matrix One Results. 




Figure 2. Test Matrix Two Results. 
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Figure 3. Results of Test Matrix Three. 



for Ml falls consistently below the others. Throughout the 
midrange of all three test cases, the curve for Ml is about 
two-thirds of the total enumeration curve. 

Because the second method shows greater promise as a 
screening device, more detailed results on its performance 
will be presented. Unlike stepwise regression, this method 
is capable of ordering all combinations of size q according 
to their R* 2 values. (Stepwise regression will typically 
only look at one or two of the combinations.) Thus, Table 
V contains a summary of the combinations selected by the 
second method, M2. 

From this table, the graph in Figure 4 was constructed, 
giving R* as a function of the number of variables. This 
curve seems fairly typical of what can be expected. Note 
that as the number of variables allowed to enter increases, 

R” 2 initially increases. At some point the curve will peak, 
then decrease to the multiple correlation value of the en- 
tire set of variables. This phenomenon can be explained in the 
following way. The value of T^ (see 4.19, 4.20) may be of 
either sign. When the number of variables is small, it is fair- 
ly likely that some combination of variables will be such that 
the will combine with the same sign, so that when squared 

and summed again the value may get quite large. (In the three 
examples used in this paper, the value typically exceeded one.) 
However, as the number of variables increases, it becomes much 
more likely that the t^j's, differing in sign, will conflict 
with each other and reduce the overall values being squared 
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ro 
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OJ 
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Table V. Combinations Chosen by Method 2. 
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and summed. The curve in Fi-y-e 4 seems typical of what 
may be expected. Thus at best th^se values are pseudo- 
correlations, which have been to rank well with, but 

not predict, the ranking of the actual multiple correlation 
coefficients. As a result, this method cannot guarantee op- 
timal selections. 

Table VI gives the rankings of total enumeration and of 
method M2 for each combination of size q=3 and q=4 from 
test case two. In this way, the results of the method may 
be more fully evaluated than to consider only the "best" 
selection made at each step. Using the rankings given in 
the table, Spearman's Rank Correlation test [Ref. 10] was 
applied; at a= . 1 the rankings of total enumeration were 
significantly correlated with the rankings of M2 in both 
cases. This lends credence to the second screening method, 
and also indicates a convenient property of this method; the 
capability to rank all combinations at each step. 

Finally some comments on the use of principle components 
are in order. It appears to provide a useful standard of 
comparison for the other methods presented. Indeed, the 
method may be of more direct use in some applications, es- 
pecially when researchers find that p>N. In such cases the 
coefficients $ may not be directly estimable, but any ver- 

A 

sion of least squares will result in e=0. Principle compo- 
nents may be useful for preliminary screening so that there 
are enough degrees of freedom N-q to obtain an acceptable 
estimate of variance of residuals. 
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0=3 Princ. Components Multiple Correlation 
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B. CONCLUSIONS 



On the basis of the results just presented, a number of 
observations concerning the screening methods under inves- 
tigation in this paper may be made. 

The first method (Ml) does not appear to perform par- 
ticularly well. Its selections were consistently worse 
than those made by the other methods. It may be that its 
disappointing performance is caused by the weak correlation 
of the major principle components with the response variable. 
As was seen in the three examples presented, this method 
does not in general even approach the optimal combinations, 
and thus shows little promise per se as a screening technique. 

Method M2 seems to be a reasonable approach to the prob- 
lem. While it will not in general make optima] selections, 
in the examples used here it has done quite well. 

The method has several advantages which are worthy of 
mention. The first is that after an initial investment in 
obtaining the eigenvalues and eigenvectors of C (roughly 
equivalent to inverting it, in time spent), the amount of 
time required to examine any potential combination is very 
small. As a result of this, enumeration of the combinations 
becomes feasible even when the number of variables is fairly 
large. Appendix C contains some remarks concerning an algo- 
rithm which will enumerate quite efficiently, and which was 

used in the FORTRAN program presented there. This program 

* 2 

outputs the largest value of R' , and the combination that 
produced it, for each value of q. 
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It is worth noting that the manner in which M2 screens 
variables is much more readily apparent than that of stepwise 
regression. The user is aware of the process by which vari- 
ables are screened, and is more able to make intelligent use 
of it . 

C . SUMMARY 

The results presented here are at best tentative, since 
only three test cases are presented. Certainly a wider spec- 
trum of test cases is necessary to establish- conclusive re- 
sults for the methods presented in this paper. Neither stepwise 
regression nor method M2 is optimal, but both remain as vi- 
able competitors which are useful as screening devices. 

Because of the success of M2, we are led to consider the 
possibility of other methods of approximating the inverses 
of the minors of C, which may be as efficient (with computer 
resources) , as those presented yet produce results which 
compare more favorably with total enumeration. 
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APPENDIX A: PRINCIPLE COMPONENTS 



The structure for principle components [Ref. 11] arose 
from a desire to find that location and orientation of axes 
that the variance of the data swarm about them is minimized. 
The necessary location of axes follows from a linear algebra 
theorem which states that a sum of squares centered about an 
arbitrary point is minimized when that point is the cen- 
troid. Thus it becomes convenient to standardize the data 
about their means. In order to define the desired orienta- 
tion, let us first agree to the following conventions. 

Let the data swarm under consideration be in p-space. 

Let the X matrix contain the p coordinates of the N obser- 
vations, and let 



1 N 

N i-1 X ij 



1 N 

and s? = rr • E 



J 



N i=l 



(x. . -x . ) 2 

^ ij • r 



(A.l) 



Then let the X matrix have entries 

(X) . . = ((x. . -x .)/s •) . (A. 2) 

That is, let the X matrix contain the p coordinates of the 
N observations after the variates have been standardized to 
zero mean and unit variance. (Note that unit variance is 
not necessary to what follows.) Then it follows that the 
correlation matrix is 



C 

pep 



1 

N 



X’X. 



The rotation itself may be denoted 



(A. 3) 
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W’x 



(A. 4) 



v 



pxl 



where v = ( ^ , v 2 , . . . , v) * , x = (x-£-,-.xJ , . . . , x ) ' , and W ] 



pxp 



IS 



a matrix column-oriented vectors, each of which provides the 
coefficients for transforming the x vector into one of the 
component directions. 

For p=2, this may be displayed graphically (see Figure 5). 

As stated previously, the goal is to minimize the variance 
about the component axes by choosing the matrix W. However, 
one may do this neatly by setting W=0_, and thus avoid the real 
problem. To obtain a tenable solution we may constrain the 
problem by requiring that the norm is one: w'w=l. The vari- 

ance about the component directions may be written 



Because minimizing the variance about an axis is equiva- 
lent to maximizing the variance along that axis, and because 
S 2 represents the variance along the axes v-^ , our problem 
will be to maximize S 2 . To show that this is true, refer to 
Figure 6. Note that to minimize the variance in the di- 
rection we must maximize along the (orthogonal) v^ direction. 
Orthogonality will be shown later. 

Thus the problem of finding the direction may be 
written 



where w is an arbitrary column of W. This problem may be 
solved conveniently by the use of LaGrangc multipliers. The 




w ' Cw . 



(A. 5) 



maximize w'Cw subject to w'w = 1 



(A. 6) 
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Figure 5. Principle Component Rotation. 




Figure 6. Maximizing the Variance Along . 
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LaGrangian is 



<f>^ = w'Cw - A(w'w-l). 



(A. 7) 



The LaGrangian is maximized at V<£^ = 0 

V4> 1 = 2 (C-AI)w = 0. (A. 8) 

Then 

(C-AI) = 0. (A. 9) 

Now the matrix C is real and symmetric, and will in gen- 
eral be positive definite; it can be shown to be at least 
positive semi -definite as follows: Let Z be a non-negative 

vector, so that Z'CZ - Z'X'XZ = (XZ)'XZ. Now let Y = XZ , 

N 

so that (XZ) 'XZ = Y'Y = y? > 0 for all z- in the vector 

v J i=l ' l - l 

Z. 

As a result, it can be shown [Ref. 9] that C is non- 
singular with real, non-negative eigenvalues. 

We require that (C-AI)=0. If (C-AI) is non- singular , 
the only solution to the simultaneous equations is w=0^, 
which violates the constraint w'w=l. Thus we require that 
(C-XI) be singular, and it follows that 

! C -X 1 1 = 0. (A. 10) 

Then the set of p solutions A^ to this equation must be the 
characteristic values of the matrix C. Pre-multiplying 



equation A. 9 


by w' we obtain 






w' (C-AI)w = w' *0 - 0. 


(A. 11) 


Then 








w'Cw = w'AIw = Xw'w = X. 


(A. 12) 
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But w'Cw = S 2 . Thus X is actually the variance along the com- 
ponent. Since we wish to maximize the variance, the solution 



we desire is the largest eigenvalue. 

Because X is an eigenvalue, (A. 9) implies that w is its 
associated eigenvector, and thus we maximize S 2 by choosing 
the eigenvector associated with the largest eigenvalue as the 
direction . Then call this eigenvector w and its eigen- 
value X 

The direction may be found solving the following equa- 
tion: 

maximize w£Cw^. subject to w^w^ = 1 and wjw k = 0. (A. 13) 

The last constraint says that we require v, and to be 
orthogonal. The LaGrangian is 



*2 = - 0 Cwiw k -l) - 2T( Wi w k -03 


f\A 141 


and 

V<J>2 = 2Cw^ - 20w^ - 2 tw^ = 0. 


(A. 15) 


Then 




(C-0I)w^ - xw^ = 0. 


(A. 16) 


Pre-multiplying by w^ , 




w^(C-0I)w k - w^.w 1 = 0 + w^CC-61) = 0 -y 
w£Cw k = 0w^lw k = = 0 -*■ S 2 = 0. 


(A. 17) 


Thus 0 is one of the eigenvalues of the C matrix. 


and w, its 



associated eigenvector. Now pre-multiply (A. 16) by wj , 



w|(C-0I)w^ - tw'w. = 0 -»• w ' ( C - I)w k = x 



(A. IS) 



w'Cw. - 0w| = x = w jCWj, . 
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• Pre -multiplying (A. 9) by w£., 

w£.(C-XI)Wi = w£*0 = 0 -*■ w^Cw^^ = Xw£w -* w^Cw.^ = 0.(A.19) 

Then - 

w k^ w l =T= ^* (A. 20). 

Note that because t turned out to be zero, the constraint 
that forced the new component to be orthogonal to the first 
component was not binding; that is, w, is inherently orth- 
ogonal to . This follows because C is a real, symmetric, 
positive-definite matrix. Since t= 0, (A. 16} becomes iden- 

tical in form to (A. 9) , so that 0 is an eigenvalue, w^ its 
associated eigenvector. It was shown above that 0 = S£ (6 is 
the variance in the v^ direction), so that it follows natural- 
ly that 0 is the second-largest eigenvalue (let 0=X ? ) . Then 
Wj, becomes w^. 

til 

This argument can be generalized so that X^, the i 

til 

largest eigenvalue of C, is the variance in the i best 
component direction v^ , and the associated eigenvector w^ is 
the set of coefficients mapping the x vector into VT . Now 
let A be a diagonal matrix with the ordered eigenvalues 
(largest to smallest) in the diagonal. The rotation we de- 
sire is then 

v = W'x, 

and the correlation matrix of the v. is A. 
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APPENDIX B: PROPERTIES OF THE CORRELATION 1 MATRIX 



Throughout the course of this paper, and especially in 
the numerical examples, there has been an implicit assumption 
that if a matrix is real, symmetric, and positive definite, 
it qualifies as a correlation matrix, and that it would be 
possible to find real data that would generate such a cor- 
relation matrix. While this is not overly difficult to show, 
it is of general interest, and is not contained in many text- 
books as a complete derivation. 

The mathematical statement of the problem is to find a 
matrix such that X'X/N = C ^ , where C is given. It has 

been shown previously that to qualify as the product of a 
matrix and its transpose, C must be positive semi -definite . 
(If C has full rank, it will be a positive definite matrix.) 
Further, by the definition of correlation between two vari- 
ables, C must be real and symmetric. 

To find the X matrix, the initial step is to find a tri- 
angular matrix T such that T'T = C. It is possible to 

pxp * 

compute the elements of the T matrix by carrying out the 
multiplication (shown in this case for a 3x3): Let T be 

upper triangular, so that 







j C 13 C 23 C 33 



C 



c 



11 



12 



C 



C 



22 



12 



C 



C 



13 



23 



or 
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T T 
A 11 A 11 


T 11 T 12 


T T 
i ll i 13 




c n 


C 12 


C 13 


T T 
1 11 1 12 


T 12 T 12 +T 22 T 22 


T T + T T 
* 12* 13 ‘ 22 1 23 


= 


C 12 


C 22 


C 23 


T T 
A 11 13 


T 13 T 12 +T 23 T 22 


T T +T T +T T 
A 13 13 2323 33 33 




C 13 


C 23 


C 33 

















Thus 



T 11 ' / ^11 T 12 C 12 1 ’ / ^n T 13 C 13 1 



22 



= /C 



r7T3 

22 12 



/C 



C 23 ' C 12 C 13 / C 11 



11 



23 






C 22~ C 12 



/ C 



11 



T 



33 




^15 ( ~ C 23 ~ C 12 C 15 / C ll^ 

C 11 C 2 2 ' C 12 / C 11 



Note that at every step it is possible to solve for 

in terms of the C. . and only those other T. . for which it has 

13 13 

already been possible to solve. This may be done in general 
for a C matrix of any size p. 

Working backwards, we have that 



T'T = 



Then let 



Z = 
Nxp 



Then 



Z'Z = 

Let Z ~ N(0,I) , 
X = 



C = X'X/N. 

X T" 1 . 

Nxp pxp 

(T _1 ) 1 X ' XT " 1 

and wc obtain 
ZT. 



(B . 1) 

(B.2) 
(B . 3) 

(T" 1 )CT“ 1 = (T'^T'TT -1 = I . 

the following transformation: 

(B.4) 



Then 



£ X ' X = T'Z’ZT = T ' IT = T'T = C. 
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Thus having obtained a matrix Z containg N observations 
on a p-variate standard normal, it is possible to transform 
Z to a new set of N observations on a p-variate normal 
(though no longer with unit variance) which will generate 
the -desired correlation matrix through the judicious choice 
of a triangular matrix T. Thus the only requirements on a 
matrix are that it be real, symmetric, and positive-definite 
in order to be a correlation matrix [Ref. 9]. 
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APPENDIX C: FORTRAN PROGRAM OF SECOND SCREENING METHOD FOR 

APPLICATION IN TEST CASE THREE 

Because of the generally good results given by the second 
screening method suggested in this paper, the FORTRAN program 
which was used to apply it to the third matrix is listed here. 
The program does the following: 

1. It reads a correlation matrix (then reverses the last 
two rows and columns, as in this case X13 is to be the re- 
sponse variable) and also reads a standard deviation vector. 
From these it obtains the covariance matrix. (Lines 1 to 40, 
beginning the count with the dimension statement.) 

2. It calls a subroutine (JACVAT) which calculates the 
eigenvalues and eigenvectors of the covariance matrix. (Lines 
34 to 50) . 

3. The equation developed under the first approach to 
the second screening method is used to calculate the multi- 
ple correlation coefficient. The printout, under the label 
"R-square equals," and showing the sums and squares, may be 
used to apply the screening method by hand. (Lines 50-73.) 

4. The covariance matrix's inverse is calculated for 
transfer to the subroutine USER in order to use it with the 
second approach to the second method. (Lines 74-90.) 

5. The next part of the program is the "driver program" 
for the subroutine TWIDDL (which provides the vector of ones 
and zeroes). It provides the initial vector (Q) , and per- 
forms the one-zero exchange. (Lines 93 to 120). 
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6. The TWIDDL subroutine'"--" an algorithm originally 
written in Algol for the Ass k .-.u- ■ n of Computing Machinery, 

.w — 

and is listed as Algorithm 384 in the CACM. Essentially 
it forms every possible combination of m ones and (n-m) 
zeroes, but each new vector requires only the interchanging 
of two positions in the previous vector [Ref. 12]. 

7. The USER subroutine uses the inverse of the covariance 

matrix, the indicator vector Q, and the augmented covariance 

matrix to screen variables using the second approach. For 

each value of m, it continually stores the largest value of 

R 2 that it has calculated to date, and when m increments it 
s 

prints the value of R 2 and the ^ rt.or Q which produced it. 

(Using this program as a backbone, it is a straightfor- 
ward matter to obtain a program which will apply the first 
method to the data matrix.) 
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THE CORRELATION MATRIX IS') 



//LAMBELL J 03 ( 2629 , 1 242 , R L 42 ) , • L AMBE LL • , T I ME = 3 
// EXEC F OR TC LG » HcG I ON. G0= 7 OK 
//FORT.SYSIN DD * 

DIMENSION A(14,14),B(13,13),C(13,13),0!13),SD(14), 
IF ( 14,14) ,GU3) 

COMMON P0rP(13) 

INTEGER HOLOQ ( 13 ) 

• INTEGER PO ,P , X , Y , Z , DONE , Q 
NEW=0 

DO 20 1=1,14 
R EAD ( 5 , 1 0 ) ( A ( I , J ) , J = l,8) 

RE AD ( 5 , 10) 4 A ( I , J ) ,J=9,14) 

FORMAT ( 8 ( 2X , F8. 5 ) ) 

CONTINUE 
DO 30 J = 1 , 14 
T EMP=A ( 13, J) 

A ( 13, J) =A ( 14, J) 

A (14, J)=TEMP 
DO 40 J=l, 14 
TEMP= A ( J, 13) 

A ( J , 1 3) =A ( J , 14) 

A! J,14)=TEHP 
WRI TE ( 6 ,503 
FORMAT ( IX,/////, ' 

DO 70 1=1 , 14 

WRITE (6 ,60M A( I , J ) , J = 1 , 14) 

FORMAT! IX, // , IX, 1 4F9.3 ) 

CONTINUE 

READ( 5, 80) CSD( I) , 1 = 1, 8 ) 

READ! 5, 80) ISD( I ) , 1=9, 14) 

FORMAT ( 8 ( 2X, FS. 5 ) ) 

DO 90 1=1, 14 
DO 90 J=1 , 14 

F ( I , J ) = A ( I t J)*SD( I )*SD( J) 

DO 100 1=1,13 
DO 100 J = 1 ,13 
B ( I , J ) = F ( IfJ) 

CALL JACVAT<B,13,1,D,C,13) 

W RITE (6, 110) 

FORMAT! IX,////, • THE COVARIANCE MATRIX IS') 

DO 130 1=1,14 

WRITE (6, 120) (F( I , J) , J=l, 14) 

FORMAT ( IX,//, IX, 14F9.3) 

CCNT INUE 
WRITE (6 ,140) 

FORMAT! IX,////, ' THE EIGENVALUES ARE') 

WRITE(6 , 150) (D(I) ,1=1,13) 

FORMAT ( IX,//, IX, I3F9. 3) 

WRITc(o,163) 

FORMAT ( IX,///,’ EIGENVECTORS ARE IN THE COLUMNS’) 
DO 180 1=1,13 

WRITE (6 ,170) (C( I, J) , J = 1 , 1 3 ) 

FORMAT ( IX,//, IX, 13F9.5) 

CONTINUE 
DC 190 1=1,13 . 

00 190 J= 1 » 1 3 

B(I,J)=C(J,I)*A(14,J)4-S0(J)/S3RT(D(I)) 

WRITE(6 ,200) 

FORMAT ( IX,////, ’ R-SQUARE EQUALS') 

DO 220 1=1,13 

WRITE(6,21G) ( B ( I , J) , J = 1 ,13) 

FORMAT! IX,// , ' ( • , 12( F8. 5, ' + ' ) ,F8. 5, ' )**Z +') 

CONTINUE 
DO 240 1=1,13 
SUM=0 . 0 
DO 230 J=1 ,13 
SUM=SUM +3 ( I , J) 

SO! I ) = SUM**2 
WRITE (6,250) 

FORMAT! IX,////, ' OR R-SQUARE EQUALS') 

WR I TF. ( 6 , 2oO ) ( SD ( I ) ,1 = 1 , 13) 

2 60 F ORMA T (1X,//,1X,12(F9.6,'+’),F9.6) 



10 

20 



30 



40 

50 



60 

70 



80 



90 



100 



1 10 



120 

130 

140 

150 

160 



170 

180 



190 

200 



210 

220 



230 

240 

250 



50 



SUM=0.0 
DO 270 1=1,13 
270 SUM=5UM+SD t I ) 

WR ITE(6»230) SUM 

280 FORMAT! IX,////, ' GR R-SwCARE EQUALS’ ,F 1 2.9 ) 

N = 13 

. DO 400 M= 1 , 6 
P0=N+ 1 
P ( N + 1 ) = - 2 
DO 320 1=1, N 
IF(I.GT.N-M) GO TO 310 
P ( I ) = 0 
GO TO 320 
310 P ( I ) = I - N+ M 
320 CONTINUE 
DCNE=0 

DO 340 1=1 , N 
IF(I.GT.N-M) GO TO 330 
Q ( I ) =0 
GO TO 340 
330 Q ( I ) = 1 
340 CONTINUE 

350 CALL USER(B,Q,M, NEW, F,HOLDQ, HOLD) 

CALL TW I DDL ( X » / , £ » DON E ) 

IF(DONE.EQ.l ) GO TO 370 
Q ( X ) = 1 
Q (Y J = 0 
GO TO 350 

370 CALL USER(B,Q,M, NEW, F,HGLDQ, HOLD) 

400 CONTINUE ! 

WRITE (6 ,410) ( HOLD Q( I ) , I = i , 13) , HOLD 

410 FORMAT! IX, • Q= 6 ' , 5X , • ( ' , 1 3 12 , • ) • , 5X , • R*2 = • , El 6 . 7 ) 

STOP 
END 

SUBROUTINE TWI DDL ( X ,Y , Z , DONE ) 

C GMMON PO , ? ( 2 0 ) 

INTEGER X,Y,Z, DONE, PO , P 
J=0 

10 J = J + 1 

IF(PU).LE.O) 30 TO 10 
I F ( P ( J- 1 ) . N E . 0 ) GO TO 40 
I=J-1 

20 I F( I.LT .2) GO TO 30 
P ( I )=-l 
1 = 1-1 
GO TO 20 
30 P(J)=0 

Z = l 

x=z 

P( l)=x 

Y = J 

RETURN 

40 IF(J.GT.l) P(J-1)=0 
50 J = J + 1 

I F ( P ( J) . GT . 0 ) GO TO 50 
K = J- 1 
I =K 

60 1 = 1+1 

I Ft PI I ) .NE.O) GO TO 70 
P t I ) =- 1 

GO TO 60 

70 IFt P{ I) .NE.-l ) GO TO 80 
Z=P (K) 

P ( I ) = Z 
X=I 

Y = K 

P C K ) = — 1 
RETURN 

60 IFt I.EU.PO) GO TO 90 
P( J) = P( I ) 

Z = P t J ) 

P ( I ) =0 
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X=J 
Y= I 

RETURN 
90 D0NE = 1 
RETURN 
END 

SUBROUTINE USER ( B , Q , rt , NEW , HO LDO , HOLD) 

DIMENSION B( 13, 13) ,C( 13,13) ,Q( 13) , HOLDQ < 13) ,PSUM< 13) 
INTEGER Q, HOLDS 
DO 10 1=1,13 
DO 10 J=1 , 13 
10 C(I,J)=B( I,J)*Q( J) 

DO 20 I =1 , 13 
P SUM ( I ) =0.0 
DO 15 J=1 ,13 

15 PSUM( I)=PSUM(I)+C(I.J) 

20 PSUM( I ) =P SUM ( I ) **2 
SUM=0 .0 
DO 30 1=1,13 
S UM= S UM + P S UM ( I ) 

30 CONTINUE 

IF(NEW.EQ.O) GO TO 50 
IF(M.NE.NEW) GO TO 80 
NEW=M 
GO TO 60 
5 0 HGLD=NE W 
NEW=M 

60 IF! HOLD . GT . SUM ) RETURN 
HOLD= SU M 
DO 70 I =1 , 13 
70 HOLDQ (I ) =Q ( I ) 

RETURN 
80 L = M- 1 

WRITE (6, 40) L, ( HOLDQ ( I ), 1 = 1 ,13) , HOLD 
40 FORMAT ( 1 X , ' Q= ' , 1 3 , 5X , ' ( ' , 13 12, ' ) • , 5X , ' R*2= • , El 6. 7 ) 

KOLD=G. 0 
N E W=M 
GO TO 60 
END 
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